Method and Apparatus for Context-Based Content Recommendation

ABSTRACT

Starting with the people in and around enterprises, the expertise and work patterns stored in people&#39;s brains as exhibited in their daily behavior is detected and captured. A behavioral based knowledge index is thus created that is used to produce expert-guided, personalized information.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.12/188,85, filed Aug. 8, 2008 which claims priority to U.S. provisionalpatent application Ser. No. 60/954,677, filed Aug. 8, 2007 and is acontinuation-in-part of U.S. patent application Ser. No. 11/319,928,filed Dec. 27, 2005, which application claims priority to U.S.provisional patent application Ser. No. 60/640,872, filed Dec. 29, 2004,all of which are incorporated herein in their entirety by this referencethereto.

BACKGROUND OF THE INVENTION

1. Technical Field

The invention relates to electronic access to information. Moreparticularly, the invention relates to a method and apparatus forcontext-based content recommendation.

2. Description of the Prior Art

One problem with finding information in an electronic network concernshow people are connected as quickly and effectively as possible with theinformation/products/services that meet their needs. This has been oneof the main goals of Web pages and search engines since the beginning ofthe World Wide Web. Failure to do so leads to lost business in the caseof eCommerce eTravel, and eMarketing sites, frustrated customers oneSupport sites who likely then call customer support, thus wasting a lotof the company's money, disinterested viewers/reader on eMedia sites whoquickly abandon the site, thus losing opportunities for advertisingrevenue, and unproductive employees on intranets.

Web site design is a manual attempt to solve the problem of informationdiscovery: to organize information in a way that the designer imagineshelps a user find what they are looking for. While effective in somecases, trying to find information in this way often is slow andineffective as users resort to poking around a site looking for theinformation they need. Most users actually abandon a site if they do notfind what they are looking for within three clicks. One problem is thatthe site is static. In more recent years, Web analytics has emerged asan attempt to alleviate this problem. Designers can see all of theactions that happen on their site and collect it into reports that aimto provide some guidance on how the site can be redesigned orreconfigured more effectively. While providing some benefit, theinformation provided is often ambiguous and provides only hints ratherthan concrete suggestions for improvement. At best, the process istedious, requires a great deal of manual effort as designers redesignthe site in line with learnings, and takes a long time. The feedbackloop is thus slow and ineffective.

Automatic content recommendation is a completely different strategy thatemerged very early in the life of the Web. Search engines, such asGoogle, Yahoo, and Ask, are the common manifestation of such techniques.The basic idea is that the user explicitly describes what they arelooking for in the form of a search query, and an automatic processattempts to identify the piece of content, most often a Web page, thatbest matches their query. The approach for doing this amounts to lookingat all possible documents and recommending those where the target queryoccurs within the text with highest frequency, i.e. keyword match.Modern adaptations of this basic technique add layers of sophistication,e.g. natural language processing, but the key in these approaches isstill to use properties of the content itself, e.g. words within thedocument, to determine the ultimate relevancy ranking This representsthe first content-centric phase of content recommendation (see FIG. 1).

Many variations to this approach exist including, most notably,meta-tagging. In this approach, the content creator selects a smallnumber of terms to describe the content. These terms are embedded withinthe content, often as HTML meta-tags, but are not necessarily madevisible to the consumer of the content. This is one way to allow searchengines to search content that is not text-based, such as video clips.This approach was very common in the late 1990's, but has since fallenout of favor due to the enormous effort required to keep the meta-tagsup to date and in-synch with changes to the content.

In many ways, this first content-centric approach on the surface make alot of sense, i.e. if you want to recommend content, consider thecontent itself. A key problem with this approach is that it often bringsback lots of documents that may be relevant but not useful. Manydocuments may exhibit a strong keyword match, but are outdated or nottruly relevant to the user's current interest. If users do not find auseful result within the first few results, they are most likely goingto abandon the search.

Keyword match does not really reflect how we find information mostefficiently in the real world. In day-to-day life, the best way to findthe information/products/services we are looking for is to ask someonewho knows to point us in the right direct. The second phase of contentrecommendation thus shifts the focus from content to users (see FIG. 1).Google's “PageRank” algorithm, though we place it in phase 1, was reallya transitional technology that harkened the coming of phase 2. The pagerank algorithm's break-through was to consider not only the content ofthe page itself, but how it had been linked to from other pages by otherWeb site designers. This represented a form of voting on the importanceof Web pages. Thus, pages that were linked to more often were seen asmore valuable. While bringing people into the equation, the people whowere voting were Web designers rather than the consumers of the content,i.e. the users. Phase 2 of content recommendation is all about theusers. The three most well known approaches that fall into phase 2 are:folksonomy, profiling/behavioral targeting, and collaborative filtering.

Folksonomy

The first, folksonomy, represents the most straight-forward addition tophase 1. Here, users are allowed to tag content themselves. So, ratherthan the Web site designers, or a single designer, being responsible forcoming up with the best set of keywords to describe the content,folksonomy lets the community do it. Once this is done, those communitycreated tags essentially become part of the content and can be searchedusing traditional information retrieval/search techniques developed inphase 1. A big assumption in this approach is that the subset of thecommunity who takes the time to tag the pages explicitly, ultimatelyproduce a description that is valid and representative of the largercommunity's opinion. This is often not the case.

Profiling/Behavioral Targeting

Profiling/Behavioral targeting in its common form also borrows heavilyfrom phase 1 techniques. Here, based on a user's prior behavior on asite, e.g. the pages clicked or products purchased, a profile is builtfor that user. This profile may, in the simple case, be based on acollection of pages clicked or products purchased. The profile may alsomake use of the content itself or meta-tags to attempt to discern theuser's historical topics of interest. For example, if a user purchasedmany films tagged as “horror” by content providers in the past, then abehavioral targeting system would tend to recommend more “horror” filmsto the user. A major assumption here is that a user's historicalbehaviors are a good predictor of future interest. While sometimes true,this assumption tends to fail at least as often as it works. The reasonfor failure is that people exhibit a variety of behaviors depending ontheir current interests, context, and goals. For example, someone whobought a few books on guitar as a one-time gift for his wife a few weeksago, might continue to be recommended guitar books by a behavioraltargeting approach, even though he may no longer have interest in thattopic. Profiling approaches often also take into account demographicdata of users, such as age, gender, and geographic location. The corebelief underlying such approaches is: If I only knew enough about a userI could predict exactly what they want. However, some basicintrospection uncovers the fallacy underlying this approach. Forexample, I may know more about my wife than any person or machine. I amin this way the ideal profiling system for her. However, I am unable topredict what she might be currently looking for online without somecontext.

Collaborative Filtering

Collaborative filtering is another user-centric approach which isarguably the most strictly user-centric. Here, users are compared to oneanother based on common purchases, click histories, or explicit ratings.For example, based on a person's previous ratings of movies on a moviesite, find other people who most agree with that person's ratings andrecommend other movies that he liked. Standard “people who bought thisalso bought that” approaches are actually a variation on thecollaborative filtering approach, where a user's most recent actionserves as the sole basis for identifying similar users. This approachwas made popular by Amazon's recommendation engine. A big assumption inthis approach is that some global similarity measure between users basedon past behavior is a useful way to predict future interest. This is aflawed assumption, however. One may be very similar to some of hisco-workers in a work context, e.g. they are all Java engineers, withsimilar interests regarding programming, but quite different from theseco-workers when outside of the office, on the golf course for instance.In the context of golf, one likely has a very different peer group.Grouping users at a global level is more often misleading than helpful.

Another weakness in all of the user-centric approaches in phase 2 is thereliance on either explicit measures of liking or overly-simplisticimplicit measures. Explicit measures include asking the user to indicatetheir liking of a particular piece of content, e.g. on a 1-5 scale. Suchapproaches are almost always biased because they represent a very smallpercentage of the population. Further, the people who are taking thetime to do these ratings are not representative of the community as awhole. They tend to be very opinionated or reflect a specificpersonality type that is willing to spend the time to voice theiropinion.

Those approaches that leverage implicit observations as a rule eitherlook at clicks or purchases. Clicks are a flawed way to assess likingbecause getting someone to click on a result has a lot more to do withan intriguing, perhaps even ambiguous, title and location on page. Ittells one nothing about how a user felt about the content once it isselected for viewing. At the other extreme, many systems use purchasesas a measure of liking. While purchases are a reasonable way to assessthis, they are too limited. For example, when buying a camera, one mayseriously consider a number of products before making a decision. All ofthat information could be valuable to others interested in cameras aboveand beyond to the one ultimately purchased.

SUMMARY OF THE INVENTION

An embodiment of the invention represents Phase 3 in the evolution ofcontent recommendation (see FIG. 1). Here, the idea is to start byunderstanding the current user's context, i.e.: What is their intent?What are they looking for? Based on this understanding, then find theappropriate peer group representing other users who are most like thecurrent user in the context of this identified interest. From there,find the content that that peer group identifies as most relevant to thecurrent context.

The approach taken in the invention is context-centric or, put anotherway, intent-centric. The techniques used to achieve this approach aredescribed later and are fundamentally based on the UseRank technologyand affinity engine described, in part, in U.S. patent application Ser.No. 11/319,928, filed Dec. 27, 2005, which is incorporated herein in itsentirety by this reference thereto. It should be noted that all previousapproaches, including the content-centric and user-centric approaches,are subsumed and improved by the Phase 3 approach manifested in theinvention. Because the invention adds the dimension of context to thepicture on top of users and content, it is always possible to choose toignore context and use the system to provide phase 2 functionality, suchas collaborative filtering or behavioral targeting/profiling. However,even these previously known approaches are significantly improved intheir functionality based on a critical aspect of the invention whichprovides full-spectrum behavioral fingerprints.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic overview of the evolution of contentrecommendation leading up to the invention;

FIG. 2 is an architectural schematic diagram showing a method andapparatus for context-based content recommendation according to theinvention;

FIG. 3 is a schematic architectural diagram showing affinity engineintegration according to the invention;

FIG. 4 is a schematic architectural diagram showing a dynamic, adaptive,real-time platform of community wisdom and community guided Webaccording to the invention;

FIG. 5 is an architectural schematic diagram showing a community wisdomplatform according to the invention;

FIG. 6 is schematic flow diagram showing the process of translatingcontext to recommendations;

FIG. 7 is schematic flow diagram showing the process for achieving avirtual folksonomy according to the invention;

FIG. 8 is a schematic diagram showing topic attraction, topic match, andactiveness according to the invention;

FIG. 9 is a schematic flow diagram showing the behavioral fingerprintingprocess according to the invention;

FIG. 10 is a graph showing a long tail marketing model with anarchitectural overlay according to the invention;

FIG. 11 is a “Wisdom of Crowds” pseudo-equation;

FIG. 12 is a schematic flow diagram showing an affinity engine memoryprediction machine according to the invention;

FIG. 13 is a screen shot of the welcome screen of the customer portalaccording to the invention;

FIG. 14 is a screenshot of a lift report within the customer portalaccording to the invention;

FIG. 15 is a screenshot showing community-guided e-commerce according tothe invention, including recommendations for competing and complementaryproducts according to the invention;

FIG. 16 is a screenshot showing community-guided e-travel according tothe invention;

FIG. 17 is a screenshot showing community-guided marketing according tothe invention;

FIG. 18 is a screenshot showing social search and community informationpopup/overlay according to the invention;

FIG. 19 is a screenshot showing community-guided online supportaccording to the invention;

FIG. 20 is a screenshot showing a community intranet/knowledge portalaccording to the invention;

FIG. 21 is a screenshot showing community-guided media according to theinvention;

FIG. 22 is a screenshot showing community-guided social search for mediaaccording to the invention;

FIG. 23 is a screenshot and schematic diagram showing community-guidedcross-site recommendations for media according to the invention;

FIG. 24 is a screenshot showing community-guided video recommendationsaccording to the invention;

FIG. 25 is a screenshot showing community-guided topic recommendationsaccording to the invention;

FIG. 26 is a schematic flow diagram showing the community-guidedcontext-relevant Ad recommendations according to the invention;

FIG. 27 is a schematic flow diagram showing a live connection accordingto the invention;

FIG. 28 is a block schematic diagram showing the system architecture ofa preferred embodiment of the invention;

FIG. 29 is a block schematic diagram showing code snippets of the clientintegration through JavaScript tags and REST in a preferred embodimentof the invention; and

FIG. 30 is a block schematic diagram showing an AJAX platform accordingto the invention.

DETAILED DESCRIPTION OF THE INVENTION

The invention represents Phase 3 in the evolution of contentrecommendation (see FIG. 1). Here, the idea is to start by understandingthe current user's context, i.e.: What is their intent? What are theylooking for? Based on this understanding, then find the appropriate peergroup representing other users who are most like the current user in thecontext of this identified interest. From there, find the content thatthat peer group identifies as most relevant to the current context.

The approach taken in the invention is context-centric or, put anotherway, intent-centric. The techniques used to achieve this approach aredescribed later and are fundamentally based on the UseRank technologyand affinity engine described, in part, in U.S. patent application Ser.No. 11/319,928, filed Dec. 27, 2005. It should be noted that allprevious approaches, including the content-centric and user-centricapproaches, are subsumed and improved by the Phase 3 approach manifestedin the invention. Because the invention adds the dimension of context tothe picture, on top of users and content, it is always possible tochoose to ignore context and use the system to provide collaborativefilter, profiling. However, even these previously known approaches aresignificantly improved in their functionality based on a critical aspectof the invention which provides full-spectrum behavioral fingerprints.

FIG. 2 is an architectural schematic diagram showing a method andapparatus for context-based content recommendation according to theinvention.

Full-spectrum behavioral fingerprints provide a significant advancementover current state-of-the-art implicit ratings, which essentially amountto click analysis and purchase behavior. First, they take into account awide variety of user behaviors including, but not limited to, clicks,time spent on a page, scrolling and mouse movement, explicit actionssuch as print, email, bookmark, links used, frequency of return visits,and searches performed. Second, all of these behaviors can becross-correlated with the current user's behaviors on other pages, andalso with the rest of the community's and identified peers' behaviors onthe current page. From this analysis, a probability that the user hasfound value in a particular piece of content can be discerned and fedinto the learning system.

A further aspect of the invention concerns the seamless integration ofexisting strategies for automatic content recommendation (see FIG. 3),including search engines 301 and profiling systems 302, and ad servers303, as well as systems for manual recommendations includingmerchandising rules and systems 304. The invention also seamlesslyintegrates with other information sources related to the content, suchas product catalogs 305, which can be used for purposes of display,filtering, or learning.

Finally, because of the comprehensive nature of the informationcollected and the learned affinities, the system represents a generalwisdom platform 501 on top of which many applications can be built (seeFIG. 4 and FIG. 5), such as SocialSearch 502, Content 503 and Product504 Recommendations, Insights 505, eMail 506, and Live connect 507 whichare tailored to various applications, e.g. eCommerce, eMarketing, etc.Additional applications include reports and integrations with bidmanagement systems for search engine optimization and search enginemarketing 508, mobile and IPTV 509, and custom applications or mashups510. Some of these applications are described later. Each of thesecritical aspects are now described in greater detail

Context- and Intent-Centric

A core advancement in the evolution of content recommendation embodiedby invention is the process for identifying and representing the userscurrent topic of interest and converting that interest into a set ofuseful recommendations and information. The steps in this process areshown in FIG. 6 and outlined below:

Step 1: When a user comes to a Web site, they immediately begin toestablish their current context. They might do this by entering a queryinto a search box on the site, navigating to a particular section, ormay even have established some context before arriving at the site bydoing a search on an external search engine, such as Goggle or Yahoo,that led to this site. All of this information is captured by anObserver Tag, i.e. a piece of HTML/JavaScript embedded in the Web site.As the user continues to move through the site, they may also showinterest in a particular page or piece of content, based on theirimplicit actions. Interest is determined based on various behaviorscollected by the Observer Tag and analyzed using the invention's FullSpectrum Behavioral Fingerprint technology 601 (described in greaterdetail below). These pages of interest further contribute to the user'scontext.

All of this information is stored as the user's current context vector602, which is a hybrid vector of terms and documents with weights oneach entry reflecting how strongly that term or document reflects theuser's current context. As a user enters search terms and/or clicksnavigation links, the vector entries corresponding to the terms andphrases entered or clicked are incremented to capture expressedinterest. As these actions move further into the past, the correspondingentries are decremented or decayed. Similarly, documents that a userclicks on or indicates interest in, as determined based on theirimplicit actions, increment the corresponding vector entry to a degreebased on the level of interest and level of certainty determined by theinvention. The result is a representation of the user's current contextas a context vector. It is also possible to increment the context vectorfurther, based on historical actions and historical contexts of interestof the user. Although this is not generally done in the preferredembodiment because such information can be misleading, it is possible toincrement the context vector to a lesser degree based on thesehistorical contexts in applications where historical behavior isconsidered to be relevant.

Step 2: Expand and refine the context vector into an intent vector 603,based on affinities/associations learned from the aggregated wisdom 604collected from observations on the community as a whole over a longerterm. For example, the user may have entered a query about digital SLRcameras and expressed implicit interest in a Nikon page. In the contextvector, the entries corresponding to “digital SLR camera” as well as“digital,” “SLR,” and “camera” to a lesser degree are incremented, as isvector entry corresponding to the particular Nikon page of interest. Tocreate the intent vector, the system looks at affinities between theterms and documents in the context vector to other terms. For example,the community wisdom may have discovered that the term “high-resolution”may be highly associated with both the term “Nikon,” as well as with thespecific Nikon page of interest. The intent vector is thus incrementedat the entry corresponding to “high-resolution.” Similarly, otherdocuments that are associated to the terms in the context vector may bediscovered based on the community wisdom and are incremented in theintent vector. For example, a Canon camera page may have strongaffinities to “SLR camera” and become part of the intent vector to somedegree. The affinities 604 that allow the expansion of the contextvector into an intent vector are determined by the affinity engine whichis described below and in detail in U.S. patent application Ser. No.11/319,928, filed Dec. 27, 2005. In summary, the affinity engine learnsconnections between documents and terms, documents and documents, termsand terms, as well as users to other users, documents, and terms bywatching all of the implicit behaviors of the user community on the siteand by applying behavioral fingerprinting and the use rank algorithm todetermine interest and associations. The ability to translate contextinto intent effectively is a key aspect of the invention.

Step 3: Identify the group of users who share affinity to the currentintent, as well as those users who exhibit behavior most like thecurrent user within the context of that intent. In U.S. patentapplication Ser. No. 11/319,928, filed Dec. 27, 2005, these are called“experts” and “peers” respectively, but here we combine them both underthe name “peers.” This peer group is represented by a user vector 605and each user entry in the vector may have a weight indicating howstrong of a peer he is to the current user in this context. Although notgenerally recommended, the system is also capable of limiting the peergroup to those users who match the current user based on a set ofpredefined attributes, such as age, gender, location, or otherdemographic variables. Similarly, the peer group can be limited to thoseusers who visit the site at the same time of day, e.g. morning,afternoon, evening. In some cases these attributes can be used toinfluence the peer weights for users, such as giving slightly more peerweight to those users who also best match the user along thesepredefined variables. The invention can also learn these weights bycomparing behavioral patterns, e.g. documents found useful in thesimplest case, within and across each of the predefined attributegroups. Higher intra-group similarity in behavior compared tointer-group similarity indicates the group is differentiated and thus ahigher weight of influence for groupness is warranted when influencingthe peer group. Similarity can be measured by looking at similarity ofdocuments and terms used through one of many similarity calculations,e.g. cosine similarity, on users' aggregated interest vectors.

Step 4: Look at which documents have highest affinity to the identifiedpeers within the current intent 606. To do this the affinity enginelooks at a combination of factors, including the overall usefulness of apiece of content, as represented by the activeness vector, learnedaffinities between terms and content, as represented by term-docmatrices, and predicted navigational patterns, as represented bynext-step matrices, associated with those peers the factors are weightedaccording to their peer weight, and used in aggregate to compute thosedocuments with highest affinity to the current intent. These documentsbecome the unfiltered recommendations 607. Filtering, described ingreater detail below, may now be applied to limit or augment thosecommunity recommendations.

Step 5: Now that the recommendations have been identified andappropriately filtered, a final optional step is to ask the affinityengine for community information on each of the recommendations 608.This information can be combined with other asset information, such astitle, size, or price, and displayed to the user to help them understandthe community wisdom underlying a recommendation. Many aspects of thecommunity wisdom associated with a document in the affinity engine canbe exposed, but in common implementations we expose the number of usersin total who found value in the document, the number of peers whoassociated the document with the current context/intent, and theterms/phrases the peer community has associated with the document.

This latter piece, i.e. terms associated to the document, is called avirtual folksonomy because it represents terms that the community hasassociated to the document, but unlike a traditional folksonomy,discussed above, where users must explicitly make the associates, thevirtual folksonomy is created automatically by the affinity engine basedon the implicit actions of the user community. Terms that are searchedor clicked as navigation links and lead, within one or more steps, touseful content, as determined by behavioral fingerprinting, becomeautomatically associated. These term-doc connections are bothfundamental to the process of providing recommendations, as well asproviding useful feedback to users on the topics associated with eachrecommendation. When these terms are displayed to a user, they can bemade clickable so that the user can click them to provide further inputregarding their current intent. This information then becomes part oftheir context vector, the whole process repeats, and new recommendationsmay be provided based on this new information. FIG. 7 provides aconceptual diagram of a virtual folksonomy, contrasted againsttraditional approaches for connecting terms to content.

Time (Decay, Trend, and Fad Detection)

Time factors in heavily to all computations in moving from context torecommendation. First, all information collected by the affinity engineis subject to time decay. This means that, for example, information fromlast month has less of an influence on the calculations than informationfrom today. This is important because the site may change, e.g. newcontent added or removed, people may individually change, e.g. theirinterests may change, and the community as whole may shift interest,e.g. new fads. Although there is a default decay rate for allinformation, some information may decay away more rapidly if it isdetermined that this is necessary. For example fad behavior, such asinterest in Christmas products that come and go quickly, may need to bedecayed more quickly to prevent recommending Christmas products too longafter Christmas has passed. To prevent this, the system runs a trenddetection system across all content on a periodic basis, e.g. once aday, or every five minutes for very time-sensitive sites. Trenddetection can be done in a number of ways, but one way is to use theMann-Kendall algorithm. Other ways include various regression techniquesthat employ least-squares fit. If a strong negative trend is detectedfor a given piece of content, the information associated with thatcontent in the affinity engine is decayed at a more rapid rate. Theresult is that the likelihood of the affinity engine recommending thispiece of content is reduced.

Another algorithm for fad detection that is run on a periodic basis iscyclical fad detection. If a piece of content shows a strong positivetrend and that same trend can be found at regular intervals in the past,then that piece of content can be automatically boosted in importance inanticipation of the coming trend.

In a traditional Web site, even where the site proprietor is trying tounderstand customer behaviors, techniques that are typically used surveythe customer and analyze the customer responses to produce a report. Bythe time the report is analyzed, the results of the report are out ofphase with the actual situation at the time the report is beingreviewed. For example, in a traditional system, a commerce site mightcollect feedback during the Christmas season and redesign their site inFebruary in response to that information. In effect, the commerce siteis trying to sell Christmas products in February. Alternatively,analytics reports may be created offline and analyzed by a team ofspecialists to infer trends and determine appropriate actions; a processwhich may take weeks or month and again lead to out-of-date sitemodifications. In this way, the automatic nature of the invention hereinallows a Web site to adapt in real-time to discovered trends and fads,providing recommendations to users that fit the current context andtime.

In addition to contributing to the process of automatic contentrecommendation, however, the trend and fad detection algorithms used bythe invention can also be exposed to owners of the content system, e.g.the Web site, through reports within a customer portal. In the preferredembodiment, such reports on community trends and fads can, for example,be used by merchants to promote certain products or content at the righttime, making such promotions more effective. Given the example ofproducts that are sold during the Christmas season, if such sales wereto die out after Christmas, then the proprietor of the commerce sitewould be able to follow the sales curve based on community interactionwith the Web site. If, on the other hand, an emergent news story drivesdemand for a product quickly, the invention allows the merchant to watchthe curve of demand and respond in real-time. Thus, if people are all ofa sudden dramatically interested in a particular piece of content orproduct, the proprietor of the Web site can provide that content orproduct more quickly. If the demand dies out quickly, for example theinterest was based on a fad, then the proprietor of the site can adjustto that fact as well.

Affinity Engine

As already mentioned, and as discussed in greater detail in U.S. patentapplication Ser. No. 11/319,928, filed Dec. 27, 2005, the affinityengine learns connections/affinities between terms-documents,terms-terms, terms-users, documents-documents, documents-users, andusers-users. (Note: the terms document, content, and asset are usedinterchangeably throughout this document. In all cases, these termsrefer to any type of medium that provides information or servicesincluding, for example but not limited to, webpages, word documents,pdfs, video, audio files, widgets, and/or products). In the preferredembodiment, all affinities are stored as sparse matrices and vectors.However, there are many alternative ways of storing such informationknown to those skilled in the art. Although there is ultimately a singlenumber that can be calculated to represent the affinity between any twoentities, e.g. a document and term, there are usually severalsub-affinities that are combined in a weighted sum to arrive at thatsingle number. The weights on that sum may be dependent on context. Forexample, documents have at least three kinds of affinities to otherdocuments: similarity in virtual folksonomy, i.e. terms the communityhas associated to the documents; similarity in user groups, i.e. howmany users have used both documents; and similarity in navigationalpatterns usage, e.g. do users show a pattern of finding value in onedocument after using the other. A variety of mathematical techniques areavailable to be employed as appropriate to each kind of data, although acombination of vector space models and custom probabilistic techniquesare currently used by the preferred embodiment of the invention. SeeU.S. patent application Ser. No. 11/319,928, filed Dec. 27, 2005 formore discussion.

One important aspect of the invention is that all sub-affinities arealways computed bi-directionally. To illustrate this bidirectionality,we describe its application to the term-document sub-affinity derivedfrom the virtual folksonomy. FIG. 8 is an architectural diagram showingthe aspects of the invention that involve topic attraction and topicmatch, the two dimensions that make up the bidirectional virtualfolksonomy connection between terms and documents, as well as generalactiveness. In the example of FIG. 8, a query is provided 801 and apercentage of respondents who ultimate find value in one of threeexample documents are indicated. The query involves a particular Nikoncamera and it can be seen that 20% of the respondents find value in aWeb page focused on high-resolution cameras 802, 78% found value in theoverview page for the specific camera 803, and only 2% of users makingthis query found value in the detailed specification sheet for thecamera 804. As a side note, remember that value is determined by thefull spectrum of behaviors exhibited on a given page by the user(described in detail later).

There are two ways to consider the virtual folksonomy connection betweenthe query and each example document. Topic attraction starts from thequery and considers the proportion of users who searched for “Nikon580X” and subsequently found value in each document. In this example,the Overview page 803 has highest topic attraction, with 78% of usersfinding value there, followed by the High-resolution camera page 802with 20%, and then the spec sheet 804 with only 2%. Note that thisexample is simplified for the purposes of illustration. In reality, wealso break down the query into its component parts and consider theaffinities of these to each document, as well as consider affinitiesbetween other terms that the affinity engine has learned have similarmeaning to this one. In this way, topic attraction can be thought of asa problem of predicting the probability that a user finds value in agiven document, given the intent represented by the query. In thepreferred embodiment arriving at this probability is accomplished usinga combination of probabilistic techniques, including Bayesian inference.

Topic match looks at it from the other direction starting from eachdocument and considering all of the other terms that have beenassociated with it through the behaviors of other users. So, forexample, the High-resolution camera page 802 may have many other cameranames connected to it and, thus, the degree of focus on the particularquery entered is lower than both of the other two pages, whoseterm-document connections are more highly concentrated around “Nikon580X” specifically. In fact, the Spec-sheet 804 may turn out to be thedocument with term connections most focused around “Nikon 580X” and thushave the highest topic match for this query, even though it has thelowest topic attraction. As with topic attraction, in reality, topicmatch also breaks down the query into its component parts and takes intoaccount affinities between other terms not in the query. In this way,topic match can be seen as finding the document with the best overallmatch in topic to the intent represented by the query. In the preferredembodiment, a vector space model and modified cosine similaritytechnique, similar to that used by traditional full-text search, is usedto determine the degree of matching.

These two directions of considering the affinity of each document to thequery, i.e. topic attraction and topic match, are combined in anon-linear weighted sum along with the final factor of overallactiveness i.e., usefulness, to arrive at a value for the virtualfolksonomy sub-affinity. This sub-affinity is then combined with othersub-affinities, such as those based on navigational patterns, andfiltered through the lens of peer groups, to arrive at an ultimateranking of documents against the query, called UseRank. Similarbidirectional techniques are used to compute UseRank when providingrecommendations on a particular Web page, and when considering the fullcontext and intent vectors.

There are several methods for combining the various sub-affinities andthe bidirectional dimensions therein to arrive at the ultimate UseRankof documents for a particular user and context. In the preferredembodiment, one or more thresholds are applied to each dimension,including absolute and relative thresholds before they are combinedtogether based on hybrid arithmetic-geometric weighted average. Eachresulting sub-affinity is then similarity subjected to a thresholdingbefore being combined with other sub-affinities again based on hybridarithmetic-geometric weighted average. The values for the thresholds aregenerally fixed, however, the individual weights for the weightedaverages can also be adjusted and learned based on the success of theresult set returned by the affinity engine.

Full-Spectrum Behavioral Fingerprints

As discussed herein, a key feature of the invention is the processingand analysis of implicit observations that are made during the actualuse of a Web site, for example, versus traditional approaches intechnology which use explicit feedback. One key advantage of theinvention, as confirmed through scientific studies, and as understoodfrom human psychology and sociology, is that humans are very bad atgiving feedback, particularly if the feedback must be given explicitly.If a person is being surveyed, the person does not have an incentive togive actual accuracy in the form of feedback. One aspect of theinvention eliminates such survey bias by using the implicit behaviorsobserved during use of particular materials on the Web by individualswithin a community. Thus, the invention trusts what people do but doesnot watch what they say. The invention watches people's behavior throughtheir actual action, implicit behaviors, and can accurately interpretwhat their true intent is and whether or not they dislike or likesomething. Thus, the invention observes behavior versus click actions.

Current technologies focus on clicks, such as Web link clicks. If aperson clicks on a link, that click is reported. The click may bereported as having resulted in a viewed Web page, even though not muchtime is spent on the page on which the person clicked. Thus, this knownapproach is not a good indication one way or another if the page is goodor bad. If a link is put in a prominent position on a Web page, thenpeople are likely to click on it. However, when people get to thelocation indicated by the link, they may immediately leave the site.This is why the number one used button on the browser is the BACKbutton. The use of the BACK button could indicate like or dislike of asite. Accordingly, the invention recognizes that it is the action of theuser after the click that matters and not the click itself The inventiontracks behaviors beyond the click to determine whether a page is good orbad. Thus, if a person backs out of the page, it is considered negativefeedback, i.e. the person did not like it. In this way, clicks canidentify a very negative reaction in the invention. If a person goes toa link, follows the link down, spends time there, and does other things,that behavior is tracked as well. If a peer group validates the behavioras consistent, e.g. a significant number of the group members exhibitthe same behavior when reacting with the page, then the page isconsidered to be a good page.

An embodiment of the invention goes one step farther. Not only does theinvention determine which assets are useful based on behavioralfingerprinting, but it also learns the context associated with theusefulness. For example, the invention can learn that a particularcamera page is very useful for users that show interest and intent inhigh definition cameras, but not if the intent is compact cameras. Inthis way the affinity engine is able to distinguish the usefulness ofassets based on the context and intent expressed by a user through theirimplicit actions.

The primary input to the affinity engine, which drives all of thelearned associations (affinities) is the behaviors of users on the site.In the preferred embodiment, all behaviors are captured by theObserverTag, i.e. a piece of HTML/JavaScript embedded within the Website, typically in a header or footer template. Although this is thepreferred method for capturing user behaviors, it is also possible tocapture user behaviors using a browser plug-in or lower-level networktraffic analyzer. The behaviors captured include:

Pages visited and in what order

Time spent on each page

Links clicked

Searches performed

Time spent scrolling on a page

Portion of page visible in the browser window, and for how long

Page sub-elements opened/closed

Media launched, time spend viewing that media, and explicit action takenon the media

Use of the back-button

Repeated visits to a particular piece of content

Mouse movement while on the page

Ads viewed and Ads clicked

Explicit actions, e.g. Add to Cart, Purchase, Email, Save, Print, PrintPreview

Virtual Print: content returned to frequently over the course of severalhours or days

Virtual Bookmark: content left open over multiple hours or days withintermittent periods of activity

Entrance and Exit paths

The captured information is send back to the affinity engine forprocessing. FIG. 9 shows the process for analyzing these behaviors orother implicit data captured from various user interface devices. Theinput to this process is the user trail 901 which includes all assetsvisited and the implicit (and explicit) actions observed on thoseassets. There are two main steps in processing the behaviors thatcombine to provide an understanding of what content is useful and inwhat contexts. The first step is determining whether a user is findingvalue (usefulness) in a particular piece of content. Conceptually, themore time spent on the page in think mode, i.e. user is processing theinformation, the higher the likelihood that it is useful Think mode canbe approximated by non-scrolling time on a page, where some scrolling ormouse-movement has been detected with a specified time range. Repeatvisits and percentage of page seen are also generally good indicators ofliking or usefulness. For each user and piece of content, we can createa behavior vector 902 with each entry in the vector representing one ofthe features listed above or a predefined combination of features.However, not all content is created equal nor are all users createdequal. We normalize the behavior vectors in a several ways. First, wenormalize by user to make behavior vectors comparable to the rest of theuser population 903. For example, some users may read slower thanothers, affecting their mean time spent on a page. Some users may usethe mouse more than others. One way to accomplish this normalization isby translating all entries into z-scores which adjusts for means andstandard deviations specific to that user.

A second normalization is to normalize based on the content. This isdone based on inherent or specified properties of the content. Forexample, 30 seconds spent on a one paragraph document likely has adifferent meaning than 30 seconds spent on a ten page document. Dwelltime can thus be normalized based on page length or number of words.Similarly 30 seconds spent viewing a 30 second video has a differentmeaning than 30 seconds spent viewing a five minute video. In many casesthe Observer Tag is capable of capturing these page characteristicsthrough information within the DOM (Document Object Model). However,when not available in the DOM, the invention provides other mechanismsfor collecting the needed information from offline catalogs, e.g. aproduct or media catalog, or allowing the Web site designer to addexplicit information in the page itself, e.g. as meta-tags or addedJavaScript variables. For example, on an e-commerce site, certain pagesmay be defined as information versus product pages. The system cannormalize the behavior vectors for each of these content groupsindependently. In this way, the dwell time necessary to indicate likingcan be adaptable to the type of content. All behaviors (features) can bemade adaptable in this way. In the current implementation, thesenormalization strategies are hard-coded. However, the invention allowsthe possibility for plugging-in various machine learning techniques tolearn the appropriate normalizations.

We now have a behavior vector normalized to the user and the content904. In the current system, a predefined set of rules are applied todetermine whether this behavior vector represents liking/usefulness.Each of the normalized features, in turn, is considered to determinewhether it meets the pre-specified thresholds for indicating usefulness.Each passing feature increases the probability of usefulness 905. Insome cases, the thresholds for a given feature are pre-specified andtuned by the person implementing the system based on previousexperience. In other cases, these thresholds are dynamically determinedby the system. For example, certain features are known to exhibit abimodal distribution and the threshold can be dynamically determined tolie between the two modes. The invention also provides a mechanism forplugging-in various machine learning techniques to learn which featuresare most important, and thus dynamically learn the rules for convertingfrom the behavior vector to the probability of usefulness.

Once usefulness, or probability of usefulness, has been determined, thesecond step in the analysis of behaviors is to understand the entirecontext surrounding the use of that piece of content, including searchesdone prior to the use, links clicked prior, and pages used prior. All ofthis information combines with information about the user to influenceaffinities 906 (affinities can be learned and stored in a number ofways, described in more detail in U.S. patent application Ser. No.11/319,928, filed Dec. 27, 2005). Before doing so, however, there is onefinal step: validation. Validation is a form of noise filtering whereinan affinity connection is established, e.g. between a document and term,if and only if enough similar users have, through their behaviors,confirmed this connection. One user making the connection is not enoughfor an affinity to emerge. A minimum number of users in the same context(peers) must have exhibited similar behaviors and connections for thataffinity to be validated. We call this peer-validated behavior.

UseRank™

UseRank™, the ultimate ranking of the usefulness of content based on auser's context and intent, based on learned affinities and full-spectrumbehavioral fingerprinting, can be compared to the very popular PageRankstrategy made famous by Google. In PageRank, each Web page is given avalue based on the number of other pages linking to it. In addition,links coming from pages that are themselves of high value raise thePageRank even more. This democratic strategy is quite effective ingetting users to useful Web sites based on their Google query, butgenerally breaks down once the user begins to look for furtherinformation within the Web site itself The main reason for this is thatpages within a Web site, particularly those down into the long tail ofcontent, are not heavily linked to either externally or internally.Internally, the linkages are determined solely by the structure of theWeb site and more links does not necessarily mean more value. Inaddition, many of the pages on such sites are in formats other thanHTML, e.g. PDFs, Word Docs, or videos, and do not link to other contentat all.

The UseRank methodology, based on user behaviors on the Web sitealleviates all of these problems. Instead of relying on the linking ofdocuments by Web designers, it relies on usage of documents by Webusers, who are a rich source of information on any Web site. Usage is atruly democratic way of learning what content is most valuable. In fact,the designers of PageRank recognized their approach as an approximationof user activity. Google's choice to approximate user value based onPageRank makes sense given the privacy concerns associated with trackinga user's behavior across the entire Web. In the preferred embodiment ofUseRank, we track only the behaviors of user's on sites instrumentedwith the Observer Tag, and always in an anonymous fashion unlessotherwise configured by the Web site deploying the Observer Tag.

Long-Tail

Another important aspect of the invention emerges from thecontext-driven approach underlying the affinity engine. Many previoussystems that are content-centric or user-centric suffer from a problemwhere only the most popular product/content is ever recommended. Becausethese systems lack a deep understanding of the user's currentcontext/intent, this is the best that they can do. The inventionprovides a context-centric approach that allows the affinity engine tonarrow its focus to the subset of users (peers) and content that matchthe current user's context/intent, even if that intent is notparticularly popular in the grand scheme of things. This is what isknown as the long tail of the products, i.e. those products that may notbe the most popular overall, but are extremely important because theystrongly meet the need of a small but important subset of the community(see C. Anderson, The Long Tail: Why the Future of Business Is SellingLess of More, Hyperion Press (2006)). The invention is capable ofidentifying that long-tail interest along with the associated peer groupand related content and can thus recommend those important long-tailproducts/content at the appropriate time (see FIG. 10). This is criticalfor product/content providers because it is often these long-tailproducts that are most useful to the community and often lead to thehighest margins or benefits for the company itself

Thus, the invention uses like-minded peers as an enabler to target smalltarget segments along the long tail, instead of using individual problempersonalization and historical interest as known in the prior art. Theinvention provides contextual targeting. People's needs within a givencontext are typically similar among like-minded people. An insight ofthe invention is that people have hundreds of profiles, if not thousandsof profiles. These profiles have no cross connections to reality, so itis difficult to predict an individual's use level without context. Oncethe individual is in a particular context, i.e., among like-mindedpeers, the individual behaves similarly to everybody else within thegroup context. Thus, if a person is visiting a golf equipment site tobuy a golf driver or clubs, the person behaves in a manner similar toother golfers. It does not matter what the person's political bias is,or their cultural background. These aspects of the individual have norelevancy because the person's present context, which is based upongroup membership, is more relevant.

Wisdom of the Crowd

The invention represents a new approach to leveraging implicit communitywisdom to create adaptive Web sites and other information portals. Inhis book, The Wisdom of Crowds, James Surowiecki explains how thecollective intelligence of a large group of average individuals almostalways outweighs the intelligence of experts. To illustrate the concept,he uses an example from a county fair, where a group of fair attendeesattempted to guess the weight of a cow. A group of so-called experts,e.g. butchers, dairy farmers, etc., also made their guess. In the end,the experts were all off on their guesses by a large amount. The averageweight guessed by the crowd of non-experts, however, came within onepound of the actual weight of the cow. Surowiecki goes on to show how asimilar phenomenon can be seen in everything from stock marketprediction to democratic governance. This notion that groups ofindividual actors can collectively exhibit a level of collectiveintelligence going beyond even the sum of the individual actorsthemselves has been known in the fields of biology and artificialintelligence as emergent behavior and collective intelligence. Numerousexamples exist in nature where very simplistic individual animals, suchas ants or bees, in collection exhibit extraordinary intelligence andresourcefulness in meeting the needs of group, such as finding food orbuilding nests.

In the context of Web site design, Surowiecki's experts are the Webdesigners. These experts attempt to make correct decisions on which Webpages should be linked to what others, such that visitors are best ableto find what they are looking for. Web designers may also spend a lot oftime an effort tuning search results to match the expected needs oftheir visitors. The crowd in this context is the large group of Web sitevisitors who come to the site. Generally, the crowd has no direct impacton the organization of the Web site. They remain silent.

The invention gives this crowd a voice, enabling their collectiveactions, i.e. a form of expressed opinions, to be collected andautomatically drive decisions on Web site organization through theirimpact on recommended content. The invention thus taps into the wisdomof the crowd for the purposes of creating useful Web sites. Although inthe preferred embodiment, the impact of crowd wisdom is sectioned offinto specific regions of recommendations and social search within theWeb site, it is easy to imagine an extended implementation where theentire organization of the Web site, all of its links and menus, forexample, are ultimately driven by and adapted to the behaviors of theWeb site visitors. In this way, the entire Internet made up of multipleinterconnected sites could begin to evolve and adapt into a form thatbest meets the needs of Internet users.

FIG. 11 provides a Wisdom of Crowds pseudo-equation showing howcontext/intent can be incorporated in a generalized way to extend thewisdom of crowds concept, as exemplified by an embodiment of theinvention. In this pseudo-equation, each user's implicit vote isweighted according to the similarity between the current user's contextand the context in which a user made that vote. This equation could befurther extended to incorporate similarity between users. When allcontexts are identical, this equation correctly reduces to a simpleaverage.

Memory Prediction Machine

The actions of the Affinity Engine can also be considered as a form ofMemory Prediction Machine. Recent learnings from cognitive science teachthat the brain is structured to capture and encode associations betweenand among objects and concepts in targeted regions of memory, andorganize those associations in a hierarchy moving from abstract todetailed. New stimuli are responded to and their consequences predicted,based off of the memories created from previous stimuli and learnedassociations thus encoded. The Affinity Engine similarly learnsassociations between users, objects, and contexts by observinginteractions between them in the environment of Web sites and remembersthose patterns in its memory, stored hierarchically. When a userexhibits a context that has been previously learned by the AffinityEngine, it lights up the appropriate associations within memory whichtriggers the prediction of those objects that best meet the needs of thecurrent user and context. FIG. 12 is an architectural schematic of theAffinity Engine in its function as a memory prediction machine.

Seamless Integration with External Systems

An important aspect of the invention is its ability to integrateseamlessly with other recommendation systems, policies, and informationsources. Some examples are shown in FIG. 3.

Search Engine

When a user performs a search on a site implemented with the invention,the search terms are sent to the affinity engine for processing. Theaffinity engine determines the set of content that is most useful giventhe context of that search, based on learned affinities from pastcommunity behaviors. However, the affinity can also take into accountopinions from external sources, such as a search engine. The searchengine 301 has its own set of recommendations which the affinity enginecan accept. The relevance ratings from the search engine can be combinedwith the information embodied in the affinity engine to produce a singleunified set of recommendations (discussed more fully in U.S. patentapplication Ser. No. 11/319,928, filed Dec. 27, 2005). An XML feed is atypical way for the system to interface with a search engine.

Product Catalog

The invention also provides a mechanism for automatically importingproduct or media catalogs 305. The preferred means for doing so is usingan XML feed that is accessible to the system. There are three ways inwhich the catalog can be used:

1) Information from the catalog may be displayed along with therecommendations, e.g. summary or price; 2) Attributes from the catalogmay be used to filter the result set, e.g. the Web site designer maywant to restrict recommendations to products only or PDFs only aparticular section of the Web site; 3) Content categorizations can beused to group content for the purposes of normalization in thebehavioral fingerprinting process described above.

Ad Servers

Another aspect of the invention is that it is content-agnostic. Thesystem can observe behaviors on and recommend Web pages, pictures,videos, documents, blogs, downloads, and even ads. In the case of adrecommendations, it is often necessary to integrate with an ad server.Ad servers 303 can provide functionality similar to both search enginesand catalog systems as described above. The invention interfaces with adservers in the same way as it does to these, typically through XML.

Merchandizing Systems

Merchandizing systems 304 in the general case provide a mechanism forthe Web site owners to influence what products/content are recommendedto users. In the invention, it is preferred that the community be theprimary driver of recommendations. However, there are times when theowners have specific needs that must be met independent from thecommunity's expressed interest. For example, the owner may choose alwaysto recommend a particular product or piece of content first in aparticular context, for example, there may be a promotion going for aproduct. Owners may also wish to influence the community to purchaseproducts with higher margins, or prevent a certain piece of content frombeing recommended. There are two ways in which the invention can honorsuch influence. The first is through a custom rule interface that is onecomponent of the invention. Here, Web site owners can log in and choosefrom a variety of predefined rule types including pinning, i.e. forcinga particular recommendation to show up; blacklisting, i.e. preventing agiven recommendation or class of recommendations from showing up; andboosting, i.e. increasing the chances that a recommendation or class ofrecommendations show up, but still honor the community wisdom. All ofthese rule types can be applied globally or only in a given context,e.g. on a particular page, for a given search term, for a given class ofusers, in a particular time range. Another type of rule allows owners tohonor fully the community wisdom, but influence the goal of therecommendations. For example, rather than focusing on recommendinguseful products, the affinity engine can be told to recommend productsthat a user is most likely to purchase. As another example, the affinityengine can be asked to recommend content that most likely leads users toa particular set of target pages. In addition to the custom ruleinterface, the invention provides a mechanism for importing rules in XMLform from an external merchandising application. When rule types existin the external system that do not already exist within the invention, acustom rule plug-in can be designed for the invented system whichmatches the desired behavior of the external rule type.

User Profiling

A further type of system that the affinity engine can integrate with isuser profiling systems 302. In its simplest form, a profiling system isa set of attributes, e.g. demographic attributes, associated with auser. These attributes can be passed along to the affinity engine on arecommendation request. As discussed earlier, the affinity engine cantake these attributes into account when identifying a peer group, whichthen influences the recommendations that are returned. Moresophisticated profiling systems require XML integration similar to otherexternal components. Here, the affinity engine may have to dynamicallycontact the external profiling system during the peer identificationprocess.

Preferred Embodiment

In the preferred embodiment of the invention, a Recommendation System isimplemented in a Software as a Service (SaaS) Model to provide automaticsuggestions that help Web site visitors find products or content theylike or need. FIG. 2 is an architectural schematic diagram of a methodand apparatus for context-based content recommendation. In FIG. 2,interaction of users with a Web site produces implicit emergentbehaviors. Such user interaction includes page referrals, links, entrytrails, queries, page sizes, mouse movements, peers, negativeexperiences, virtual bookmarks, time spent, virtual printing, exittrails, and the like. Such behaviors are processed in arecommendation/affinity engine according to the invention, resulting inautomatic content and product recommendations in the form of socialsearch and navigation guides, as well as providing real-time feedback toa merchant such as regarding visitor clubs, and identifying content gaps(each of which is discussed in greater detail below). The following keyprocesses comprise the recommendation system:

Capturing Implicit Behavior—The implicit Web site behaviors 201 thatserve as input to the recommendation system are captured client-side bysmall snippets of JavaScript code embedded within Web site pages, e.g.through a Web site template. The behaviors are then sent to the remoterecommendation engine where they are processed using the full-spectrumbehavior fingerprinting technology described herein.

Distilling Collective Wisdom—The recommendation/affinity engine 202processes all incoming information to identify emerging intent among Website visitors and captures the collective wisdom of the crowd. Therecommendation engine identifies and learns affinities between and amongusers, content, and terms, identifying consistent patterns of behaviorand removing noise. The affinity engine also recognizes and adjusts inreal time to changing fads and trends, as well as cyclical patterns ofbehavior, such as seasonal patterns, and other shifts in user interest.

Delivering Recommendations and Search Results—When a visitor arrives ata page, the recommendation engine is passed information about the userscurrent context, either by snippets of JavaScript on the client, calledRecommendation Tags, or by the Web site server, and a request is madefor appropriate recommendations. The recommendation engine translatesthe user's context into intent using the technology described in thisapplication, identifies the appropriate group of peers, and ultimatereturns with a set of content recommendations 203 ordered based on thetheir computed UseRank™. These recommendations are then displayed withinthe Webpage on the user's browser. These recommendations may take theform of navigation links or enhanced search results. When deliveringrecommendation results, the recommendation system may additionallycontact other external services, such as the Web site's full-text searchengine, product catalog, or an ad server (see FIG. 3).

Customer Portal—In the preferred embodiment, owners of the Web site alsohave access to information and patterns learned by the recommendationengine through a customer portal called Insights 204. The customerportal provides the ability to configure the recommendation system andadd merchandising rules, as well as view a set of reports providinginformation on the usage of the Web site by the community and thepatterns of behavior and affinities learned by the system.

FIG. 13 shows a screenshot of the Customer Portal homepage in thepreferred embodiment. Many different reports and configurations areaccessible using links within this portal. For example, the inventionallows for product/content gap detection, where gaps are detected basedon usage patterns on the site and site owners are helped to introducenew products or content in those areas where the gaps are detected. Asanother example, the Customer Portal also provides the Web site ownerwith a set of reports to track the ongoing value that the RecommendationEngine is brining to the Web site. FIG. 14 shows a screenshot of an A/Breport that tracks the revenue lift due to the presence ofrecommendations on the Web site. The inventors have consistently seenrevenue improvements of 20-30% or more. Other lift reports availablewithin the portal include improvements in page views, engagement time,site stickiness, average order value, and user conversion to specified“business targets” within the Web site.

Applications

The following discussion involves different applications of theforegoing technology:

FIG. 15 is a screenshot showing community-guided e-commerce. In FIG. 15,five different capabilities of product recommendations are presented:

1) Comparative Products, also known as Similar or Competitive Products,are shown for purpose of comparison shopping and up-selling based onaffinity of like-minded peers on their product considerations, i.e.observed full-spectrum behavioral finger prints discussed earlier.Consideration-based peer recommendations promote higher end/nicheproducts, and yield more revenues and profits for a site. 2) AffiliatedProducts are also shown for cross-selling related/non-competitiveproducts such as accessories to increase overall order size. 3) Theinvention also provides a showing of most popular products, such asproducts with greatest appeal across the whole site or within acategory. 4) The invention also generates intent-driven landing pages,also known as AdGuide and Site Concierge, with recommendations based onwhat a visitor searched on standard Web search engines such as Google,Yahoo or MSN. If a visitor search “Viking Cooktop” on Google and landedon a customer site, AdGuide serves the best Viking cooktoprecommendations to the visitor on the landing page dynamically insteadshowing him something irrelevant. 5) The invention also provides acommunity filtered search on products based on the foregoing.

For peer-driven recommendations, the invention automatically identifiessmall and large population segments having a unique interest and guidesindividuals to popular, competing, and accessory products. The inventionprovides a user intent and product mapping that translates shoppers'intent into peer-validated products and brands. The inventors have founda 700 percent conversion power based on independent studies. Theinvention provides fad and trend detection and recognizes and adapts toseasonal, promotional, and other shifts in shopper interest inreal-time. The invention also allows for product gap detection, whereproduct gaps are detected and site owners are helped to introduce newproducts or content in those areas where the gaps are detected. Theinvention assists in merchandising with real-time customer feedback. Itworks with existing merchandising, promotions, segmentation and search,and magnifies products or product families. The invention providesbuilt-in concurrent A/B measurement and reports in real-time the netrevenue generated by the invisible crowds.

FIG. 16 is a screenshot showing community-guided product search with atravel example. In FIG. 16, the user has queried for hotels in New YorkCity. As a result of similar users of this Web site for similarpurposes, and by observing how the users have made use of theinformation returned, recommendations are made to the person with regardto hotels to select within New York City. The search results are farmore effective and relevant based on like-minded peer interests andintent instead of showing many irrelevant results by simply matchingkeywords or metadata, in this case “New York Hotels”.

FIG. 17 is a screenshot showing community-guided marketing and onlinelead generation. As with the e-commerce application, Five differentcapabilities are provided to a marketing Web site.

1) Aspects of this feature of the invention include a social searchwhich involves UseRank™ and implicit learning. This approach isintent-driven, adaptive, and makes use of an implicit folksonomy,community terms such as link-text they use and search queries theyenter, as discussed above. The invention supports audio, video, binarycode, and all other content types because it does not have to parse thecontent itself, whereas a typical search engine has to, therefore islimited to text content and metadata only. 2) The invention alsoprovides Related or Similar Content based on the implicit behavioralfeedback of like-minded users. The invention ranks the recommendationsby usefulness and on-target-ness, based on the context and intent. 3)

The invention also provides a most popular category, which providescontent with the greatest value across the whole site or sub portion ofthe site. This promotion of information leads to business targetconversion increases such as trial/download/registration conversions andmay be provided at a site or topic level. 4) A Next Step feature isprovided that concerns common next steps that lead to customerconversions. The next step may be any next step in a business process ora natural next set of content to read in connection with the process. 5)A similar Intent-Driven Landing Page, as with the e-commerce applicationdiscussed above, is available for a marketing Web site.

FIG. 18 is a screenshot showing social search according to theinvention. In the example, the search is made for storage devices. Usersare provided information with regard to whether the information is ontarget and for how many visitors, and popularity rating and a communityrank, i.e. UseRank™. An on-target value is also provided along with theresults. Further, the results are sorted by usefulness, based oninformation within the affinity engine. The original keyword rank fromIR or full-text search engine is also provided as a before-and-aftercomparison. In this example, if the UseRank™ is not used, this goodpiece of primary storage product information would have been buried asrank 87, i.e. the seventh result on the ninth search result page, and noone would be able to find and use it. The UseRank™ has moved it to thethird position on the first page in the search results. At the interfacelevel, a Raw Results link is provided to give the users the originalresults if this invention were not used there. This is another way tosee before-and-after impact of the UseRank™.

FIG. 19 is a screenshot showing community-guided online support topromote self-service and satisfaction among the customer base. The pageshows top issue resolutions, including most helpful articles, a top FAQ,and help forum, and blog articles. The page also shows related help,including in-depth articles on the subject, alternative solutions and,finally, reminders. Further, common next steps are provided, includingnatural next steps for resolution, common downloads, and additionalsupport information and contact information. Finally, the inventionshows social search, which is a key aspect of this embodiment of theinvention, based on UseRank™ and implicit learning. Thus, social searchis intent-driven and adaptive, makes use of the implicit folksonomy, andsupports all content types. Other good support applications for this useinclude knowledge management (KM) portals, community forums anddiscussion threads, chats, wikis and developer's networks, etc.

FIG. 20 shows a community-guided intranet and knowledge portal. In thisexample, social search is used within an intranet. The search returnsrelated pages, which include similar pages on the topic, the pages areproven by the past experience of peers, and serendipitous discovery isenabled. Further, the search filters out low value content. Theinvention also provides most popular search results, which includescontent with the greatest value, top applications within the company,and information specific to a department or cross-company information.Finally, the invention provides next steps, including where people gofrom this page to meet their goals, a next step in the business process,and the next set of content to read on the topic. Extranets, such asphysician-patient portals, dealer net, customer portals, are otherapplications examples.

FIG. 21 shows recommendations implemented on a media site, and FIG. 22shows social search implemented on a media site. FIG. 23 is a screenshotshowing community-guided media and cross-site contextualrecommendations. Similar Articles and Blogs are recommended based onsimilar contexts. If you are reading a Tiger Woods golf story, threeother related Tiger news, based on other similar peer interests arepromoted to the same visitors. The visitor gets more content andsatisfied. The site get higher CPM and CPA, both of which are currentindustry measures of visitor conversion rates and site value fromadvertising perspective. UseRank™, also known by Baynote as BrandRank™,can also be used to re-rank the ads and product on the media, instead ofplacing ads or products randomly or based on keyword match. Adsvalidated by peers have significant higher CPM and CPA values, andtherefore increase revenue for the media sites. In addition, theinformation contained in these various sources is applied to theaffinity engine, which discovers cross-site affinities among the varioussites. The result is to provide targeted content and products.

FIG. 24 is an example of the invention being used to recommend videocontent. As discussed in this application, the invention is contentagnostic and is able to learn affinities between term, users, andcontent based entirely on implicit user actions.

FIG. 25 shows a screenshot of an Implicit Topic Cloud. Anotherapplication of the invention is to collect and display the communityvocabulary or terms. On the surface, the Topic Cloud may appear to besimilar to explicit tagged cloud such as Deli-cio-us, socialbookmarking, etc. But the key difference is the implicit-ness. ImplicitTopic Cloud is built based on the actual user's activities on the site,i.e. the queries they entered and link-text they used. Actions speaklouder than words. The implicit Topic Cloud reflects in the interests of100% of the site visitors, not the few loud visitors who are willing totag things. The silent majority is the key to collect true wisdom of thecommunity.

FIG. 26 shows an architectural view of an integration between theinvention and an AdServer. To serve the best ad to a user, theintegrated system first contacts the existing Ad Server to get a list ofacceptable ads to display to the user given demographic or otherinformation. This list is then sent to the Affinity Engine which, basedon learned affinities between the Ads, users, and context, chooses thebest ad to display to the current user. This allows the ads served to beappropriate both to the user, as well as the user's current context andintent.

There are many ways to leverage the community wisdom distilled by theRecommendation Engine, in addition to social search and onsitecontent/product recommendations. Many applications can be built ormodified to directly tap into this wisdom through APIs to theRecommendation Engine. FIG. 4 is an architectural diagram showing theinvention's role as a community wisdom platform. The community-wisdomplatform can also include such items as contextual email marketing,mobile applications, IPTV systems, SEO/SEM applications, and mashups incustom applications. Within these systems, input from the RecommendationEngine can be used to dynamically determine the most appropriateinformation and/or organization of information to present to users. Sucha platform is built upon Web services that include such features as AJAXand REST (discussed below). Thus, the invention is built upon new Web2.0 fundamentals, which include a true understanding of invisiblecrowds, gathering information from like-minded peers, making content orproduct recommendations, providing onsite social search, implicitcommunity-based reports, and the like. The following descriptions givedetails on how they are deployed:

Email Recommendations: Marketing uses massive emails to communicate withcustomers and prospects. But the content of the email campaign istypically determined by the marketing staff The invention enhances thequality of email campaigns by crowd-sourcing the like-minded peers. Thecontent of the emails are determined by what's popular or similar towhat the like-minded readers of the emails are interested in. Marketerssimply facilitate the process of letting one customer promoting contentto another customer implicitly.

Mobile Recommendations: The quality demand of mobile recommendations iseven higher than the Web due to the small footprint of mobile devices,such as cell phones and personal devices. The invention's like-mindedpeer and intent driven approach is better than traditional approachesbased on collaborative filtering, page views, and demographic orpurchase data. Intent/context driven peers predict people's need farbetter than the competing approaches.

IPTV Recommendations: Similar recommendations are effective for usingDigital TV/HDTV to watch internet delivered movies and video programs.With thousands of movies and TV program selections, features like “Youmay also like these movies” are extremely important for up-sell andcross-sell movies and TV programs. Context-driven peer recommendationsare the most effective way for the viewers to like the recommendations.

Live Connect: FIG. 27 shows an architectural view of the Live Connectsystem. Because the invention can identity like-minded peers, inaddition to harvest their collective wisdom implicitly and to make peerrecommendations to individuals, the invention can connect theindividuals with the like-minded peer group and have them exchangeinformation, knowledge, and experience. This is similar to the realworld experience of like-minded peers who may not know each other.Imagine you are in Best Buy shopping for a TV, you see other like-mindedpeers also shopping for TVs, you can ask them questions for theirexperience of certain products.

SEO/SEM Recommendations: The invention is also used for guiding Web sitevisitors at the moment of landing on the site. The features are called“AdGuide,” “Site Concierge”, and “Site Maitre'd.” This applicationexamines the queries that people have searched on Google, Yahoo, MSN,ASK, and other major Web search engines, then uses them as the proxy ofuser intent to display the right product or content for visitors whenthey land on the site, either on the pre-designed landing pages (SEM),or natural search result pages (SEO). For example, if a visitor search“Viking Cooktop” on Google and landed on a Baynote customer site,AdGuide serves the best Viking cooktop recommendations based oncollective interests of like-minded peers to the visitor on the landingpage dynamically instead showing him something irrelevant or lessimportant by simply matching keywords.

Based on the affinities learned by the affinity engine, the invention isalso able to provide a Keyword Recommendation system that providessuggestions to marketers on which keywords should be purchased onexternal search engines as part of their SEO/SEM efforts. For example,the invention can suggest those terms that the community uses most oftento describe content on a Web site, as well as those terms which are mostlikely to lead to useful content within the site. Also, given aparticular collection of keyword already purchased, the invention cansuggest other words used by the community that have high affinity to theexisting set. These recommendations can be provided to SEO/SEMmerchandisers either through reports within the Customer Portal or canbe directly integrated with external SEO/SEM systems to bid on therecommend keywords dynamically and automatically.

Blogs, Forums, and Discussion Threads Recommendations: The invention isused by community generated content such as blogs, forums, discussions,pod casts, video content, etc. Because of the volume of contentinvolved, surfacing the right content in the right order is even moreimportant than expert-generated Web sites. Using like-minded peers forrecommendations and social search is extremely important and effective.These Web sites can be public facing blog sites, or login-required sitessuch as partner portals or patient-physician portals, or developer'snetworks, or intranets. Features such as “Similar content,” “Accessoryproducts,” “Next steps,” “Most popular” and social search are veryuseful and effective.

Insights (Visitor Clubs, Content Gaps, etc.): Insights, the CustomerPortal, has already been discussed in part earlier in this application.Although a primary use of the invention is as an automaticrecommendation system, there are times when site administrators may wantto modify the behavior of the recommendation system or see reports basedon the knowledge learned by the affinity engine. Insights currentlyoffers three main categories of functionality though its UI:configuration, management, and reports. Configuration enables siteadministrators to configure fully all aspects of the recommendationengine required for full functionality and deployment, includingintegration with external sources of information such as full-textsearch engines and ad servers. Management enables site administrators tomodify the behaviors of recommendations and social search in a holisticmanner or specific to defined contexts e.g. a particular search term orpage location or user. For example, administrators can createproduct/content promotions to override community wisdom for set periodsof time, blacklist particular products/content from being recommended,artificially magnify or boost defined classes of products/content, orset up business rules such as limiting the classes ofproducts/recommendations acceptable for recommendation in particularcontexts. Reports offer various views on the community wisdom distilledby the affinity engine. Although the reports include a subset of theclick-based and purchase-based information found in traditionalanalytics, they go beyond these in providing many details on thecommunity wisdom distilled through full-spectrum behavioralfingerprinting and the connections learned by the affinity engine.Example reports are discussed below.

The Content Gap report uses the affinity engine to analyze those userinterests that are not being met by the existing content on the site.Using the full-spectrum behavioral fingerprinting technology, and byanalyzing community behaviors as discussed earlier in this application,the affinity engine can identify community interests. The affinityengine can then analyze all cases in which a particular interest wasexpressed by a user and the subsequent behaviors of that user, includinguse of content and level of value extracted, as well as subsequentsearches or exits. If a large portion of the user population with aparticular interest is unable to find useful content, then that interestis designated a content gap. The degree to which the gap is suspected bythe affinity engine can also be reported. The administrator can thentake necessary actions to remedy the gap by adding new content and thencontinue to use the Content Gap report to see the effect on the presenceof the gap.

The Visitor Clubs report provides the most comprehensive view on theaffinities learned by the affinity engine. Visitors to the site areimplicitly grouped based on shared interest. Each interest group, orvisitor club is displayed in the interface along with the number ofvisitors in the club, their overall activity level, the virtuallyfolksonomy of terms that describe the club, and the content/productswhich are most useful to that club in the context of the club'sinterest. In addition, all club characteristics can be charted over timeto gain an understanding of how the interest is trending over time.Clubs can also be compared to one another for overlap in membership.This report provides administrator with the ability to explore thelearned associations between users, content/products, andterms/contexts. Such associations can then drive promotions or othermerchandizing activities within the Management section of Insights. Thevisitor club information can also be used to drive various businessdecisions outside of the context of the invented system.

Architecture Recommendation Server

FIG. 3 is an architectural schematic diagram showing affinity engineintegration according to the invention. In FIG. 3, a current userexhibiting an observable behavior within a particular content site, suchas current location and/or search, based on previous action, or userproperties, accesses the affinity engine, which includesassets/products, users, and terms, all arranged in groupings based onaffinities. The user is provided with asset recommendations inconnection with products, categories, information, media, and the like.The affinity engine is populated by site owners who provideadministrative input and exert administrative influence. The informationprovided by XML/direct input. The administrative input and influencefilters, constrains and seeds the affinity engine, applies merchandisingrules, and expresses goals, for example, likely use or likely purchaseof a particular product. Asset information is also provided in the formof XML information, for example, a product catalogue or a full-textsearch facility. The affinity engine collects implicit observations fromthe visitor community to make such asset recommendations to a currentuser. These implicit observations include asset actions, such as timespent on a page, scrolling, mouse movements or clicks, page interaction,and virtual printing or bookmaking; asset navigation and assetnavigation patterns, including, for example, entrance path, exit path,links used, repeated visits, and frequency visits, use of a back button,and recommendation use; and visitor-community searching, including termsentered and/or reentered, results clicked, results used, and subsequentnavigation.

FIG. 28 is a block schematic diagram showing the system architecture ofa preferred embodiment of the invention. A more detailed discussion ofvarious aspects of this architecture is provided below. To understandFIG. 28 in greater detail, reference may be had to Applicant's copendingU.S. patent application Ser. No. 11/319,928, filed Dec. 27, 2005, whichis incorporated herein in its entirety by this reference thereto, andwhich describes, in greater detail, a preferred system for use inconnection with the invention herein.

The architecture consists of a server farm 20, a customer enterprise 22,and a user browser 21. The user browser is instrumented with anextension 23 and accesses both customer servers 25 at the customerenterprise 22 and the server farm 20 via a load balancer 27.Communication with the server farm is currently effected using the HTTPSprotocol. User access to the customer server is in accordance with theenterprise protocols. The browser extension 23 is discussed in greaterdetail below.

Of note in connection with the invention is the provision of a failsafe.The extension 23, as well as the enterprise extension 24, areconstructed such that, if the server farm 20 does not respond in asuccessful fashion, the extension is shut down and the enterprise andbrowser interact in a normal manner. The features of the invention areonly provided in the event that the server is active and performing itsoperations correctly. Therefore, failure of the server does not in anyway impair operation of the enterprise for users of the enterprise.

As discussed above, an extension 24 is also provided for the enterprisewhich communicates with the load balancer 27 at the server farm 20 viathe HTTPS protocol.

The enterprise also includes a helper 26 which communicates with theserver farm via an agency 31 using the HTTPS protocol. The agencyretrieves log information from the enterprise and provides it to loganalyzers 28, which produce a result that is presented to the usagerepository 29. The agency also communicates with any data sourcesaccessible to the enterprise for the purposes of loading informationinto the affinity engine 32. Such information includes catalog databaseinformation, use profile data, authentication data, as well as any otherform of data that may be useful for the purposes of determining orfiltering recommendation and search results.

Information is exchanged between the affinity engine 32 and the browserand enterprise via various dispatchers 30. The browser itself providesobservations to the server and receives displays in response to searchor recommendation requests therefrom. Alternatively, the customer server25 can directly request search or recommendation results directly fromthe affinity engine 32 through the Baynote extension 24 using REST.

A key feature of the invention is the affinity engine 32 which comprisesa plurality of processors 33/34 and a configuration administrationfacility 35. During operation of the invention, a form of information,also referred to as wisdom, is collected in a wisdom database 36.

Client-Side Integration

The preferred embodiment of the Baynote extension 23 uses a custom-buildtag-based AJAX-architecture for collecting Website behaviors from theclient as well as serving recommendations to the client. FIG. 29provides code snippets for JavaScript integration with a client. FIG. 30is a block schematic diagram showing an AJAX tag platform according tothe invention. In FIG. 30, a Web page is displayed on a client andincludes a particular script. In this example, all files and resultsfrom the system are dynamically injected into the DOMS as <script>elements. A coordinator on a trusted Web server provides such things asa failsafe mechanism, common code, a policy, handlers, and communicationfacilities. A system server provides such elements as a heartbeat,common JavaScript, policies, handlers, such as an observer and a guide,and the affinity engine. In FIG. 30, the coordinator is a piece of codethat sits on the trusted Web site. In the presently preferredembodiment, the coordinator resides on a customer's Web site. Thus,there is a file that serves as coordinator role and that resides on aparticular customer Web site, although it could reside on any othertrusted Web site. This aspect of the invention concerns the notion of atrusted Web server for failsafe reasons. It is important to be certainthat if the invented system crashes that it does not crash or hold upthe customer Web site. Thus, a coordinator code is provided as afailsafe procedure. It goes through a failsafe procedure to ensureproper recovery after a crash of the server's Web site.

The architecture of the code is broken down, such that common code isused. Common code or a piece of the code is used for all types ofrecommendations. Thus, a tag might require common code. The policy maybe customer specific or it can be user specific. This concerns howrecommendations may be made. The policy information is loadedseparately. Then, the code needed for specific tags is loaded as well.For example, if there is a tag for the most popular recommendation, andanother one which is the recommendation, then these tags are loaded. Thesystem then loads the code that is necessary for performing theparticular tagging operations. A further type of tag or code is theobserver that observes what users do, as discussed above. Users do notsee the tag, but it is still a tag. The tags must also be able tocommunicate back to the system. Each of these functions is performed bya piece of code, for example JavaScript, is loaded into the Web page.The JavaScript is loaded into the page through a dynamic scriptinjection.

This aspect of the invention recognizes that a Web page can become moredynamic. For example, there are things such as dynamic HTML andJavaScript which cause things to be added to a page dynamically. If ascript tag needs a particular piece of information, then that code isinjected into the page. For example, if a policy is needed, then aJavaScript tag is injected into the page that accesses the policy. Thepolicy may require further injection of information, such as handlers.Therefore, the handlers are now dynamically injected as pieces ofJavaScript into the page. Communications are handled in a similarmatter.

One advantage of the foregoing approach is to provide the option ofusing first-party cookies or third-party cookies. The main Web page thatthe customer accesses may be the Web site of a merchant. The merchantcommunicates with the system to get the information with regard torecommendations and the like.

In this case, the merchant is the first party and the system is a thirdparty. As a result of dynamic script injection, the invention allows thesetting of cookies in both the first-party domain and in the third-partydomain. The ability to set first and third-party cookies also allows theinvention to operate in a cross domain.

Although the invention is described herein with reference to thepreferred embodiment, one skilled in the art will readily appreciatethat other applications may be substituted for those set forth hereinwithout departing from the spirit and scope of the present invention.Accordingly, the invention should only be limited by the Claims includedbelow.

1. A computer implemented context-centric content recommendation method,comprising: establishing a current context for a user at a Web sitebased upon inputs that are representative of user actions at the Website; capturing the user's current context information with an observertag that is embedded in the Web site; determining the user's interestbased on various user behaviors collected by the observer tag; analyzethe user behaviors; storing all resulting information in a memory as theuser's current context vector including a hybrid vector of terms anddocuments with weights on each entry reflecting how strongly that termor document reflects the user's current context; incrementing thecontext vector entries corresponding to terms and phrases selected as auser captures expressed interest; decrementing corresponding entries asuser actions move further into the past; incrementing a correspondingvector entry for documents that a user selects, as determined based on auser's implicit actions; and generating a representation of the user'scurrent context as a context vector for use in making recommendations ofcontent to said user.